Fix flaky :plugins:repository-azure:internalClusterTest by waiting for Azure fixture readiness and cleaning stale state#20171
Conversation
WalkthroughAdds JUnit lifecycle methods to the Azure repository internalClusterTest to wait for an Azure container to be ready before tests and to list/delete blobs after tests; also makes setUp/tearDown public overrides that ensure cluster health and remove indices and repositories. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant JUnit as JUnit Runner
participant TestClass as AzureStorageCleanupThirdPartyTests
participant AzureSvc as AzureStorageService
participant Container as Azure Container
Note over JUnit,TestClass: `@BeforeClass` — readiness
JUnit->>TestClass: call waitForAzureFixtureReady()
TestClass->>AzureSvc: init(credentials from sysprops)
TestClass->>Container: getContainerClient()
loop poll until ready or timeout
TestClass->>Container: container.exists()
Container-->>TestClass: exists? (true/false)
end
Note over JUnit,TestClass: Tests execute...
Note over JUnit,TestClass: `@AfterClass` — cleanup
JUnit->>TestClass: call cleanupAzureBlobs()
TestClass->>AzureSvc: listBlobs(container)
loop for each blob
TestClass->>Container: deleteBlob(blob)
Container-->>TestClass: deletion result
end
TestClass->>AzureSvc: close()
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧠 Learnings (2)📚 Learning: 2025-12-12T18:40:08.452ZApplied to files:
📚 Learning: 2025-12-02T22:44:14.799ZApplied to files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (4)
Comment |
|
❌ Gradle check result for 49ca72f: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java (2)
72-86: Fixture readiness check is solid; consider aligning withcredentials()or documenting behaviorThe
@Beforehook nicely encapsulates the flakiness fix by:
- Ensuring the node is green before tests run.
- Using
assertBusywith a bounded timeout sotest.azure.container/test.azure.basemust become non‑blank within 60s or fail fast.Two optional refinements to consider:
- To keep the readiness condition exactly in sync with what the test later requires, you could either:
- Also include
test.azure.accountand the key/SAS token checks here, or- Call
credentials()inside theassertBusyblock (if it has no side effects beyond validations), so the same invariants gate fixture readiness and repository creation.- A brief comment explaining that these system properties can transition from blank to non‑blank after JVM startup (due to the Gradle Azure fixture) would make this less “mysterious” to future readers.
These are nice‑to‑have; the current implementation already addresses the reported flakiness.
90-98: Double‑check interaction with superclasstearDownand robustness of wildcard deletesThe additional cleanup is directionally right (preventing state leakage across runs), but there are a couple of subtle points worth verifying:
Deleting all repositories before
super.tearDown()
AbstractThirdPartyRepositoryTestCasedefines its owntearDown()that historically:
- Cleans up the blob store for the default
"test-repo".- Deletes
"test-repo"via the cluster admin client. (jar-download.com)By calling
client().admin().cluster().prepareDeleteRepository("_all").get();first, you may be removing"test-repo"beforesuper.tearDown()runs, depending on the current framework implementation. If the superclass still assumes that repository exists (e.g., viagetRepository()),super.tearDown()could start failing withRepositoryMissingException.To avoid that risk, consider one of:
- Limiting deletion here to indices only and letting
AbstractThirdPartyRepositoryTestCase.tearDown()remain the sole authority for repo cleanup.- Or, if you really need to clear all repositories, ensuring that the superclass no longer depends on
"test-repo"being present (and documenting that assumption in a comment).Wildcard index delete edge cases
prepareDelete("*")can behave differently depending on index options and whether only system/hidden indices are present. In some configurations this may throw if there are no matching indices or if deletes of system indices are forbidden, which would turn into teardown failures rather than just leftover state.If you want teardown to be “best‑effort” rather than fail the test suite when nothing is left to delete, you could catch and ignore the specific exceptions that represent “nothing to delete / forbidden system index” while still surfacing unexpected errors.
Given this is test‑only, these adjustments are about making the cleanup more future‑proof and less coupled to internal assumptions of the base class.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: gradle-check
🔇 Additional comments (1)
plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java (1)
34-36: Imports correctly reflect new setup/teardown behaviorThe added imports for
@Before,assertBusy, andTimeUnitmatch the new logic below and keep dependencies localized to test-only utilities. No issues here.
|
❌ Gradle check result for 72ed735: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for da47b31: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
The failure appears unrelated to this change. The failing task is All Azure tests passed, spotless and compilation passed. |
|
Thanks @Gautam-aman for fixing this test. Can you ensure your commits are signed off correctly. This should give you context on how to: https://github.com/opensearch-project/OpenSearch/pull/20171/checks?check_run_id=57257172591 |
|
Hi @sandeshkr419, |
|
❌ Gradle check result for da47b31: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
This failure is unrelated to the changes in this PR. The failing jobs are Jenkins / BWC infra:
There are no test or compilation failures introduced by this PR. |
|
@Gautam-aman Looks like a compiler failure to me: Also check out the failed DCO workflow and make sure all commits are properly signed. |
da47b31 to
9e8ae03
Compare
|
Addressed the compile-time error pointed out earlier by correcting the AtomicReference usage. Rebased, signed off, and force-pushed to the PR branch. Requesting re-review. |
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java(3 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-12-02T22:44:14.799Z
Learnt from: prudhvigodithi
Repo: opensearch-project/OpenSearch PR: 20112
File: server/src/internalClusterTest/java/org/opensearch/search/slice/SearchSliceIT.java:73-81
Timestamp: 2025-12-02T22:44:14.799Z
Learning: In OpenSearch integration tests extending OpenSearchIntegTestCase, using `LuceneTestCase.SuppressCodecs("*")` triggers special handling that selects a random production codec from the CODECS array, while `SuppressCodecs("Asserting")` or other specific codec suppressions still allow Lucene's default codec randomization which may include the asserting codec. Use `SuppressCodecs("*")` when you need to completely avoid asserting codecs (e.g., for cross-thread StoredFieldsReader usage) while maintaining production codec test coverage.
Applied to files:
plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java
📚 Learning: 2025-12-12T18:40:08.452Z
Learnt from: reta
Repo: opensearch-project/OpenSearch PR: 20017
File: plugins/transport-reactor-netty4/src/test/java/org/opensearch/http/reactor/netty4/ssl/SecureReactorNetty4HttpServerTransportTests.java:256-256
Timestamp: 2025-12-12T18:40:08.452Z
Learning: In the OpenSearch ReactorNetty4 secure HTTP transport tests (plugins/transport-reactor-netty4/src/test/java/org/opensearch/http/reactor/netty4/ssl/SecureReactorNetty4HttpServerTransportTests.java), URI limit validation has been moved from the protocol layer to the transport layer, making it protocol-agnostic. The random protocol selection in ReactorHttpClient.https(settings) is intentional to ensure all tests validate correct behavior across HTTP/1.1, HTTP/2, and HTTP/3.
Applied to files:
plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: gradle-check
🔇 Additional comments (2)
plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java (2)
52-52: LGTM: Imports resolve the reported compiler error.The addition of these imports, particularly
AtomicReferenceon line 57, addresses the "cannot find symbol" compiler error mentioned in the PR comments.Also applies to: 56-57, 62-62
82-90: LGTM: Teardown properly cleans stale state.The try-finally block ensures
super.tearDown()is always called, and the aggressive cleanup (deleting all indices and repositories) is appropriate for preventing stale state in integration tests.
...alClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java
Outdated
Show resolved
Hide resolved
...alClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java
Outdated
Show resolved
Hide resolved
|
❌ Gradle check result for 9e8ae03: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 4c38bfc: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for d4ebd9a: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java (1)
70-76: Consider checking all required Azure properties in the fixture readiness check.The
waitForAzureFixtureReady()method currently verifies onlytest.azure.containerandtest.azure.base, but thecredentials()method (lines 111-119) also requirestest.azure.accountand eithertest.azure.keyortest.azure.sas_token. While the fixture likely sets all properties together, explicitly checking fortest.azure.accountwould catch incomplete fixture initialization earlier and provide clearer failure diagnostics.🔎 Optional enhancement to check all required properties
@BeforeClass public static void waitForAzureFixtureReady() throws Exception { assertBusy(() -> { + assertThat(System.getProperty("test.azure.account"), not(blankOrNullString())); assertThat(System.getProperty("test.azure.container"), not(blankOrNullString())); assertThat(System.getProperty("test.azure.base"), not(blankOrNullString())); }, 60, TimeUnit.SECONDS); }
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java(2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-12-12T18:40:08.452Z
Learnt from: reta
Repo: opensearch-project/OpenSearch PR: 20017
File: plugins/transport-reactor-netty4/src/test/java/org/opensearch/http/reactor/netty4/ssl/SecureReactorNetty4HttpServerTransportTests.java:256-256
Timestamp: 2025-12-12T18:40:08.452Z
Learning: In the OpenSearch ReactorNetty4 secure HTTP transport tests (plugins/transport-reactor-netty4/src/test/java/org/opensearch/http/reactor/netty4/ssl/SecureReactorNetty4HttpServerTransportTests.java), URI limit validation has been moved from the protocol layer to the transport layer, making it protocol-agnostic. The random protocol selection in ReactorHttpClient.https(settings) is intentional to ensure all tests validate correct behavior across HTTP/1.1, HTTP/2, and HTTP/3.
Applied to files:
plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java
📚 Learning: 2025-12-02T22:44:14.799Z
Learnt from: prudhvigodithi
Repo: opensearch-project/OpenSearch PR: 20112
File: server/src/internalClusterTest/java/org/opensearch/search/slice/SearchSliceIT.java:73-81
Timestamp: 2025-12-02T22:44:14.799Z
Learning: In OpenSearch integration tests extending OpenSearchIntegTestCase, using `LuceneTestCase.SuppressCodecs("*")` triggers special handling that selects a random production codec from the CODECS array, while `SuppressCodecs("Asserting")` or other specific codec suppressions still allow Lucene's default codec randomization which may include the asserting codec. Use `SuppressCodecs("*")` when you need to completely avoid asserting codecs (e.g., for cross-thread StoredFieldsReader usage) while maintaining production codec test coverage.
Applied to files:
plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: gradle-check
🔇 Additional comments (2)
plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java (2)
78-82: LGTM! Good practice to ensure cluster health before tests.The
setUp()override correctly callssuper.setUp()first to ensure proper initialization, then callsensureGreen()to wait for cluster health stability. This should help prevent test failures due to cluster state issues.
85-93: LGTM! Proper cleanup to prevent stale state between tests.The
tearDown()override correctly cleans up test artifacts (indices and repositories) before callingsuper.tearDown()in a finally block. This cleanup approach directly addresses the flaky test issue mentioned in the PR objectives by ensuring no stale CI state persists between test runs. The try-finally pattern ensures proper cleanup even if deletion operations fail.
|
❌ Gradle check result for 413bfdd: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java(2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-12-12T18:40:08.452Z
Learnt from: reta
Repo: opensearch-project/OpenSearch PR: 20017
File: plugins/transport-reactor-netty4/src/test/java/org/opensearch/http/reactor/netty4/ssl/SecureReactorNetty4HttpServerTransportTests.java:256-256
Timestamp: 2025-12-12T18:40:08.452Z
Learning: In the OpenSearch ReactorNetty4 secure HTTP transport tests (plugins/transport-reactor-netty4/src/test/java/org/opensearch/http/reactor/netty4/ssl/SecureReactorNetty4HttpServerTransportTests.java), URI limit validation has been moved from the protocol layer to the transport layer, making it protocol-agnostic. The random protocol selection in ReactorHttpClient.https(settings) is intentional to ensure all tests validate correct behavior across HTTP/1.1, HTTP/2, and HTTP/3.
Applied to files:
plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java
📚 Learning: 2025-12-02T22:44:14.799Z
Learnt from: prudhvigodithi
Repo: opensearch-project/OpenSearch PR: 20112
File: server/src/internalClusterTest/java/org/opensearch/search/slice/SearchSliceIT.java:73-81
Timestamp: 2025-12-02T22:44:14.799Z
Learning: In OpenSearch integration tests extending OpenSearchIntegTestCase, using `LuceneTestCase.SuppressCodecs("*")` triggers special handling that selects a random production codec from the CODECS array, while `SuppressCodecs("Asserting")` or other specific codec suppressions still allow Lucene's default codec randomization which may include the asserting codec. Use `SuppressCodecs("*")` when you need to completely avoid asserting codecs (e.g., for cross-thread StoredFieldsReader usage) while maintaining production codec test coverage.
Applied to files:
plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: gradle-check
🔇 Additional comments (1)
plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java (1)
78-82: LGTM!Correctly waits for cluster health after initialization. The
ensureGreen()call ensures the cluster is ready before tests execute, addressing the readiness concern mentioned in the PR objectives.
...alClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java
Show resolved
Hide resolved
...alClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java
Show resolved
Hide resolved
|
❌ Gradle check result for d7fb0cf: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
Spotless is failing with: This happens when Spotless runs under Java 25 and is a known formatter compatibility issue. Local formatting and spotless checks pass under JDK 21, which is the intended build JDK for OpenSearch. Please verify CI’s JDK or formatter tooling. |
|
❌ Gradle check result for 4026dac: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java (1)
63-63: Unused import:ismatcher is not used in this file.The static import of
isfrom Hamcrest matchers is added but never used anywhere in the file. Consider removing it to keep imports clean.-import static org.hamcrest.Matchers.is;
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: gradle-check
🔇 Additional comments (3)
plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java (3)
148-152: LGTM!The
setUp()override correctly callssuper.setUp()first and then waits for cluster health withensureGreen("_all"), which helps stabilize the test by ensuring the cluster is ready before tests run.
154-163: LGTM!The
tearDown()implementation properly:
- Uses try-finally to ensure
super.tearDown()is always called- Applies
LENIENT_EXPAND_OPEN_CLOSED_HIDDENoptions for index deletion (addressing the past review comment)- Cleans up both indices and repositories to prevent stale CI state
104-118: Good fix for the fixture readiness timing issue.The
@BeforeClassmethod with property validation addresses the root cause of flakiness by ensuring all required Azure properties are available before the test class initializes. The validation logic correctly mirrors thecredentials()method checks, including the mutual exclusivity ofkeyandsas_token.
...alClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java
Show resolved
Hide resolved
...alClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java
Show resolved
Hide resolved
|
❌ Gradle check result for 6481297: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
6481297 to
8be670d
Compare
|
❌ Gradle check result for 8be670d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java (2)
67-109: Consider logging exceptions in cleanup to aid CI debugging.The catch block at lines 106-108 silently swallows all exceptions during Azure cleanup. While the comment indicates this is intentional for CI teardown, completely silent failures make debugging difficult when cleanup issues occur.
Additionally, line 83 uses an empty string default
""fortest.azure.keywhen the property is missing. Although the fixture should ensure properties are set, being more defensive (e.g., checking the property exists or at least logging when using defaults) would make the code more maintainable.🔎 Optional improvements for cleanup resilience
} catch (Exception ignored) { - // CI teardown + // CI teardown - log but don't fail the test run + System.err.println("Warning: Azure blob cleanup failed: " + ignored.getMessage()); }For line 83, consider validating the property exists or using a more explicit pattern:
- secureSettings.setString("azure.client.default.key", System.getProperty("test.azure.key", "")); + String key = System.getProperty("test.azure.key"); + if (Strings.hasText(key)) { + secureSettings.setString("azure.client.default.key", key); + }
133-156: Redundant empty string defaults after property validation.Lines 141 and 143 use empty string defaults
""when retrievingtest.azure.sas_tokenandtest.azure.key. However, the firstassertBusyblock (lines 119-131) has already validated these properties are properly set. Using empty string defaults here is redundant and could mask issues if the validation logic changes.🔎 Optional: Remove redundant defaults for clarity
if (hasSasToken) { - secureSettings.setString("azure.client.default.sas_token", System.getProperty("test.azure.sas_token", "")); + secureSettings.setString("azure.client.default.sas_token", System.getProperty("test.azure.sas_token")); } else { - secureSettings.setString("azure.client.default.key", System.getProperty("test.azure.key", "")); + secureSettings.setString("azure.client.default.key", System.getProperty("test.azure.key")); }
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java(3 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-12-12T18:40:08.452Z
Learnt from: reta
Repo: opensearch-project/OpenSearch PR: 20017
File: plugins/transport-reactor-netty4/src/test/java/org/opensearch/http/reactor/netty4/ssl/SecureReactorNetty4HttpServerTransportTests.java:256-256
Timestamp: 2025-12-12T18:40:08.452Z
Learning: In the OpenSearch ReactorNetty4 secure HTTP transport tests (plugins/transport-reactor-netty4/src/test/java/org/opensearch/http/reactor/netty4/ssl/SecureReactorNetty4HttpServerTransportTests.java), URI limit validation has been moved from the protocol layer to the transport layer, making it protocol-agnostic. The random protocol selection in ReactorHttpClient.https(settings) is intentional to ensure all tests validate correct behavior across HTTP/1.1, HTTP/2, and HTTP/3.
Applied to files:
plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java
📚 Learning: 2025-12-02T22:44:14.799Z
Learnt from: prudhvigodithi
Repo: opensearch-project/OpenSearch PR: 20112
File: server/src/internalClusterTest/java/org/opensearch/search/slice/SearchSliceIT.java:73-81
Timestamp: 2025-12-02T22:44:14.799Z
Learning: In OpenSearch integration tests extending OpenSearchIntegTestCase, using `LuceneTestCase.SuppressCodecs("*")` triggers special handling that selects a random production codec from the CODECS array, while `SuppressCodecs("Asserting")` or other specific codec suppressions still allow Lucene's default codec randomization which may include the asserting codec. Use `SuppressCodecs("*")` when you need to completely avoid asserting codecs (e.g., for cross-thread StoredFieldsReader usage) while maintaining production codec test coverage.
Applied to files:
plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: gradle-check
🔇 Additional comments (2)
plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java (2)
159-163: LGTM! Proper test initialization pattern.The
setUp()override correctly callssuper.setUp()first and then ensures the cluster is in a healthy state before each test executes. This helps catch cluster issues early and provides a stable baseline for tests.
165-174: LGTM! Comprehensive per-test cleanup.The
tearDown()override properly:
- Uses
IndicesOptions.LENIENT_EXPAND_OPEN_CLOSED_HIDDENto safely delete test indices while protecting system indices- Removes all repositories to prevent stale state between tests
- Ensures
super.tearDown()runs in a finally block even if cleanup failsThis addresses the flaky test issues by ensuring clean state between test executions.
|
❌ Gradle check result for 1437bc2: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java (3)
73-84: Validate authentication property before cleanup to avoid confusing errors.The early return on lines 73-75 checks only
containerandaccount, but whentest.azure.sas_tokenis not set, line 83 will use an empty string default fortest.azure.key. If the key property is genuinely missing, passing an empty string to secure settings may cause authentication failures that are harder to diagnose than an early null check.🔎 Suggested improvement
- if (Strings.isNullOrEmpty(container) || Strings.isNullOrEmpty(account)) { + boolean hasSasToken = Strings.hasText(System.getProperty("test.azure.sas_token")); + String key = System.getProperty("test.azure.key"); + + if (Strings.isNullOrEmpty(container) || Strings.isNullOrEmpty(account)) { return; } + + if (!hasSasToken && Strings.isNullOrEmpty(key)) { + return; + } MockSecureSettings secureSettings = new MockSecureSettings(); secureSettings.setString("azure.client.default.account", account); - if (Strings.hasText(System.getProperty("test.azure.sas_token"))) { + if (hasSasToken) { secureSettings.setString("azure.client.default.sas_token", System.getProperty("test.azure.sas_token")); } else { - secureSettings.setString("azure.client.default.key", System.getProperty("test.azure.key", "")); + secureSettings.setString("azure.client.default.key", key); }
100-102: Consider logging cleanup failures instead of silently swallowing all exceptions.The catch-all exception handler on line 100 silently ignores all failures during Azure blob cleanup. While robust cleanup is important for CI, completely silent failures can hide configuration issues, authentication problems, or actual bugs in the cleanup logic itself.
🔎 Suggested improvement
Consider logging at minimum:
- } catch (Exception ignored) { - // CI teardown + } catch (Exception e) { + // Log but don't fail the build during CI teardown + logger.warn("Failed to cleanup Azure blobs during test teardown", e); }Or if truly all failures should be ignored, document why:
} catch (Exception ignored) { - // CI teardown + // Ignore all cleanup failures to prevent test failures in CI when Azure fixture + // is unavailable or credentials expired after test completion }
126-149: Remove unnecessary empty string defaults after property validation.Lines 134 and 136 provide empty string defaults to
System.getProperty(), but the firstassertBusyblock (lines 112-124) already validated that these properties are present and non-empty. The empty string defaults suggest the code doesn't trust its own validation and could mask bugs if properties somehow disappear between checks.🔎 Suggested improvement
final boolean hasSasToken = Strings.hasText(System.getProperty("test.azure.sas_token")); if (hasSasToken) { - secureSettings.setString("azure.client.default.sas_token", System.getProperty("test.azure.sas_token", "")); + secureSettings.setString("azure.client.default.sas_token", System.getProperty("test.azure.sas_token")); } else { - secureSettings.setString("azure.client.default.key", System.getProperty("test.azure.key", "")); + secureSettings.setString("azure.client.default.key", System.getProperty("test.azure.key")); }This way, if the properties are unexpectedly null, you'll get a clear NPE rather than a confusing authentication error from an empty credential.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java(3 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-12-12T18:40:08.452Z
Learnt from: reta
Repo: opensearch-project/OpenSearch PR: 20017
File: plugins/transport-reactor-netty4/src/test/java/org/opensearch/http/reactor/netty4/ssl/SecureReactorNetty4HttpServerTransportTests.java:256-256
Timestamp: 2025-12-12T18:40:08.452Z
Learning: In the OpenSearch ReactorNetty4 secure HTTP transport tests (plugins/transport-reactor-netty4/src/test/java/org/opensearch/http/reactor/netty4/ssl/SecureReactorNetty4HttpServerTransportTests.java), URI limit validation has been moved from the protocol layer to the transport layer, making it protocol-agnostic. The random protocol selection in ReactorHttpClient.https(settings) is intentional to ensure all tests validate correct behavior across HTTP/1.1, HTTP/2, and HTTP/3.
Applied to files:
plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java
📚 Learning: 2025-12-02T22:44:14.799Z
Learnt from: prudhvigodithi
Repo: opensearch-project/OpenSearch PR: 20112
File: server/src/internalClusterTest/java/org/opensearch/search/slice/SearchSliceIT.java:73-81
Timestamp: 2025-12-02T22:44:14.799Z
Learning: In OpenSearch integration tests extending OpenSearchIntegTestCase, using `LuceneTestCase.SuppressCodecs("*")` triggers special handling that selects a random production codec from the CODECS array, while `SuppressCodecs("Asserting")` or other specific codec suppressions still allow Lucene's default codec randomization which may include the asserting codec. Use `SuppressCodecs("*")` when you need to completely avoid asserting codecs (e.g., for cross-thread StoredFieldsReader usage) while maintaining production codec test coverage.
Applied to files:
plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: gradle-check
🔇 Additional comments (2)
plugins/repository-azure/src/internalClusterTest/java/org/opensearch/repositories/azure/AzureStorageCleanupThirdPartyTests.java (2)
152-156: LGTM: setUp ensures cluster health before tests run.Calling
super.setUp()followed byensureGreen("_all")correctly ensures the test cluster is fully initialized and healthy before test execution begins.
158-167: LGTM: tearDown properly cleans up test state.The try-finally structure ensures
super.tearDown()always executes, and the cleanup correctly usesIndicesOptions.LENIENT_EXPAND_OPEN_CLOSED_HIDDENto avoid inadvertently deleting system indices while removing all test indices and repositories.
|
❌ Gradle check result for 681e085: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
…r Azure fixture readiness and cleaning stale state
Signed-off-by: Aman Gautam <amangautam2128@gmail.com>
…stency Signed-off-by: Aman Gautam <amangautam2128@gmail.com>
Signed-off-by: Aman Gautam <amangautam2128@gmail.com>
…er health Signed-off-by: Aman Gautam <amangautam2128@gmail.com>
Signed-off-by: Aman Gautam <amangautam2128@gmail.com>
Signed-off-by: Aman Gautam <amangautam2128@gmail.com> Fix Azure fixture by using MockSecureSettings in test Signed-off-by: Aman Gautam <amangautam2128@gmail.com> Fix Azure cleanup by closing service to terminate reactor threads Signed-off-by: Aman Gautam <amangautam2128@gmail.com> Fix Azure cleanup by closing service to terminate reactor threads Signed-off-by: Aman Gautam <amangautam2128@gmail.com>
Signed-off-by: Aman Gautam <amangautam2128@gmail.com>
681e085 to
6d23601
Compare
|
❌ Gradle check result for 6d23601: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Aman Gautam <amangautam2128@gmail.com>
6d23601 to
04129bd
Compare
|
❌ Gradle check result for 04129bd: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
Hello maintainers, Thank you. |
Description
Stabilizes
:plugins:repository-azure:internalClusterTest, which occasionally failedin CI before executing any JUnit tests due to the Azure docker fixture not being
fully ready when the test suite began.
Fix
assertBusy()+ensureGreen()to wait for Azure fixture readinessTesting
for i in {1..20}; do ./gradlew :plugins:repository-azure:internalClusterTest; doneIssue
Fixes #20124
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.